I chose to analyze a dataset of all the items listed under “Harry Potter” in the Seattle Public Library System. This data included the material type of each item, the number of checkouts per month/year, the creator, and the title of each item. For the most part, I analyzed trends having to do with print books with J.K. Rowling listed as the creator.
I wanted to analyze this data because Harry Potter is one of the most well-known, if not the most well-known, series of all time. I thought it would be interesting to see if any insights on reader loyalty would emerge in terms of reading all the way through the books.
I was also curious what impact the controversies surrounding J.K. Rowling have had on checkouts, if any. I was curious if so-called “cancel culture” has had an observable effect on the popularity of her books.
There are 4078 print books listed under J.K. Rowling. The book with the most checkouts in a single month is Harry Potter and the Deathly Hallows with 341 checkouts in August of 2007 (Note: the Deathly Hallows was released the month prior). August of 2007 also had the most recorded checkouts across all Harry Potter titles. May of 2020 had the fewest checkouts across all Harry Potter titles. For print books, Harry Potter and the Sorcerer’s Stone has been checked out the least with 5,416 total checkouts, while Harry Potter and the Deathly Hallows has been checked out the most with 11,178 total checkouts.
The data used was collected from Seattle Public Libraries’ open checkout data.
Parameters included in the dataset are checkout month and year, material, number of checkouts, title, ISBN, creator, subjects, publisher, publication year, usage class (i.e. physical or digital), and checkout type (i.e. the system used to checkout).
Seattle Public Libraries began officially collecting the data in 2017, pulling data using George Legrady’s art installation “Making the Invisible Visible,” which visualizes the libraries’ checkout data (scrubbed of any user-identifying information). The installation provided checkout data going all the way back to 2005 when it was first installed.
Seattle Public Libraries began publishing the data in 2017 as part of Seattle’s Open Data Initiative, which promotes transparency and encourages public/private collaboration.
Something to consider when working with this data include who the data represents and who it does not. For example, who is or isn’t able to obtain a Seattle Public Library card? Who is able to travel to the physical library locations and navigate the library system? Who is able to access and navigate the online library system? These are all things to consider when analyzing the data.
One challenge posed by this dataset stems from its size and the amount of detail given to each title. Due to the nature of the data, there is no clearcut way to filter by author or even book title due to variations in author formatting and title listings. For instance, Harry Potter and the Sorcerer’s Stone might be listed under several different names (i.e. Harry Potter and the Philosopher’s Stone) and J.K. Rowling might be listed as a creator for a title with a different author. Because of the size of this dataset and the limits of my coding skills, it would be difficult to sort through the data to catch all of these differences. Nonetheless, my methods for filtering the data are included in my scripts and I’ve done my best to address areas where this limitation may cause problems.
Another challenge to consider is the effect of COVID-19 on checkout patterns. For instance, June of 2020 shows a total of 10 checkouts across all titles and formats. This is up from 2 checkouts the previous month, but down from 749 checkouts in June of 2019. It will be difficult to analyze data from this time period as any outliers may be attributed to the pandemic and its resulting effects on library policy, as well as people’s access to the libraries. This is complicated by the fact that the height of the controversy surrounding J.K. Rowling emerged in June of 2020. Thus there may be challenges to finding legitimate evidence that whether checkouts were truly impacted or not.
I included this chart for two reasons. The first was to see whether there was a noticeable change in number of checkouts following J.K. Rowling’s transphobic comments in June of 2020. The second was to see how popularity of mediums have changed over time.
In regard to the first query, it was difficult to glean any insights due to the (presumed) effects of the pandemic during the same time period. One observation that might provide some insight regarding changes in number of checkouts is that number of checkouts following June of 2020 have not reached the same levels as before June of 2020. Whether that is due to ongoing effects of the pandemic or due to a deterring effect resulting from her comments would likely require further research.
Regarding the second query, there were not really any particularly notable insights regarding change in medium popularity. Sound discs generally declined in popularity following 2008 and all mediums declined sharply in May of 2020 when pandemic-related restrictions began kicking in.
I included this chart because I was curious whether the number of checkouts for the first and last books in the Harry Potter series matched up and if there was any noticeable delay between the numbers. For example, would a high number of checkouts of Harry Potter and the Sorcerer’s Stone in January 2015 translate to a (similarly) high number of checkouts of Harry Potter and the Deathly Hallows in, say, February 2015.
Based on the chart, I wouldn’t say there were any noticeable patterns regarding the above query. Something interesting gleaned from the chart is that the number of checkouts for the first and last books in the series did not start matching up until around late 2015-early 2016, about 7-8 years after the publication of the final book.
I included this chart because I was curious whether total number of checkouts were around the same amount for each book in the series following the publication of the final book, as this might provide further insight into whether readers were reading all the way through the series.
The chart shows, however, that the book checked out the least since the final book’s publication was the first book in the series, or Harry Potter and the Sorcerer’s Stone, and the book checked out the most was the final book, or Harry Potter and the Deathly Hallows. Interestingly, number of checkouts for books 2-6 descend in order of book number.